Use a ring channel to avoid blocking write of events #2082

aledbf · 2018-02-13T17:55:08Z

Which issue this PR fixes:

fixes #2022
closes #2081

k8s-reviewable · 2018-02-13T17:55:12Z

This change is

k8s-ci-robot · 2018-02-13T17:55:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

~~OWNERS~~ [aledbf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

azweb76 · 2018-02-13T18:05:46Z

internal/ingress/controller/nginx.go

@@ -106,7 +107,7 @@ func NewNGINXController(config *Configuration, fs file.Filesystem) *NGINXControl
 		}),

 		stopCh:   make(chan struct{}),
-		updateCh: make(chan store.Event, 1024),
+		updateCh: channels.NewRingChannel(4096),


does this specify a limit of 4096? If so, the initial load does not appear to pop items off the ring until the cache has been initialized. This still might cause a problem. Can we lower to 1024 to confirm?

yes, let me lower that value and create a different docker image

cool. still works, my guess is the ring can expand beyond 1024.

aledbf · 2018-02-13T18:12:35Z

@azweb76 please use quay.io/aledbf/nginx-ingress-controller:0.326 (new value is 1024)

azweb76 · 2018-02-13T18:35:36Z

Correct me if Im wrong. On startup all events are written to the ring. Then later popped off the ring.
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/nginx.go#L246-L273

Given this, only a maximum of 1024 events will be tracked in sync. If I understand, the ring is intialized as 1024/4096/etc and if the cluster exceeds this, the first N items are dropped before they are processed.

azweb76 · 2018-02-13T18:37:27Z

Seems like we need to start processing the channel ring before we append to it so we can treat it like a buffer on initial startup.

aledbf · 2018-02-13T18:40:17Z

Given this, only a maximum of 1024 events will be tracked in sync. If I understand, the ring is intialized as 1024/4096/etc and if the cluster exceeds this, the first N items are dropped before they are processed.

Yes. That does not mean we are not handling the events. Keep in mind we use a work queue for the update of the configuration and we discard all the event in a time window (all the events in the time lapse we start and finish the update).

aledbf · 2018-02-13T18:43:19Z

Seems like we need to start processing the channel ring before we append to it so we can treat it like a buffer on initial startup.

This is one example of the chicken and the egg problem, without the store we don't have events and without the syncQueue we cannot start to process updates but syncQueue depends on the store.

Using the ring channel is the right solution because it allows us to discard the some of the initial events

azweb76 · 2018-02-13T18:47:46Z

Sure. After it has started. But this loop is what processes those items and since the loop doesnt happen until all initial events are loaded, only up to the updateCh limit is processed.

https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/nginx.go#L302
https://github.com/eapache/channels/blob/master/ring_channel.go#L92

aledbf · 2018-02-13T20:07:54Z

@azweb76 what are you proposing exactly?

azweb76 · 2018-02-13T20:17:34Z

I greatly appreciate the effort you're putting into this fix. I just want to make sure we're not introducing another bug. You certainly know more about this than I do, so I will be quiet now. Thanks again!

aledbf · 2018-02-13T20:32:13Z

I greatly appreciate the effort you're putting into this fix

The update channel was introduced because of the store refactor. This is something we needed because it was almost impossible to have tests for the informers and all the logic behind the sync process.
So I have to fix this, this is not optional :)

aledbf · 2018-02-13T20:32:18Z

The issue in 2022 is related to the number of events we receive at startup that causes a write contention in the channel (we exceed the buffer size). The change introduced here allows us to discard events when we exceed the defined size. As I mentioned previously we don't need all the events (from the start) but at the same time, we cannot run any filter at this point because of the lack of context.

If you run the test image and increase the log level to 3 using the flag --v=3 you should see lots of "skipping %v sync (%v > %v)". That means those events in the work queue where received in the middle of an update and are not really required to process (this avoids unnecessary reloads)

azweb76 · 2018-02-13T20:38:37Z

awesome. Looking forward to the release. 😄

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 13, 2018

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2018

azweb76 reviewed Feb 13, 2018

View reviewed changes

aledbf force-pushed the ring branch from 0313630 to 3f4cc91 Compare February 13, 2018 18:11

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2018

aledbf added 2 commits February 13, 2018 16:29

Use a ring channel to avoid blocking write of events

404c871

Add eapache/channels dependency

17f23cb

aledbf force-pushed the ring branch from cca747d to 17f23cb Compare February 13, 2018 19:29

aledbf merged commit 9bcb5b0 into kubernetes:master Feb 14, 2018

aledbf deleted the ring branch February 15, 2018 04:11

ElvinEfendi mentioned this pull request Mar 1, 2018

log a warning when updateCh is full Shopify/ingress#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a ring channel to avoid blocking write of events #2082

Use a ring channel to avoid blocking write of events #2082

aledbf commented Feb 13, 2018 •

edited

Loading

k8s-reviewable commented Feb 13, 2018

k8s-ci-robot commented Feb 13, 2018

azweb76 Feb 13, 2018

aledbf Feb 13, 2018

azweb76 Feb 13, 2018 •

edited

Loading

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

aledbf commented Feb 13, 2018 •

edited

Loading

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

Use a ring channel to avoid blocking write of events #2082

Use a ring channel to avoid blocking write of events #2082

Conversation

aledbf commented Feb 13, 2018 • edited Loading

k8s-reviewable commented Feb 13, 2018

k8s-ci-robot commented Feb 13, 2018

azweb76 Feb 13, 2018

Choose a reason for hiding this comment

aledbf Feb 13, 2018

Choose a reason for hiding this comment

azweb76 Feb 13, 2018 • edited Loading

Choose a reason for hiding this comment

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

aledbf commented Feb 13, 2018 • edited Loading

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018

aledbf commented Feb 13, 2018

azweb76 commented Feb 13, 2018

aledbf commented Feb 13, 2018 •

edited

Loading

azweb76 Feb 13, 2018 •

edited

Loading

aledbf commented Feb 13, 2018 •

edited

Loading